1 Introduction

Spotify is one of the larger music streaming services available today with 345 million active users 1. Instead of having to buy cds or download every song to listen to, Spotify allows access to millions of songs without having to download them on electronic devices.

In our project, our first question is if certain features have a strong correlation with other features. In addition, we want to discover the most popular genre. Our data will specify a few genres that may be the most popular. Certain features will be strongly correlated to other features.

2 Background

The data we are using is based on Spotify data from 1921 to 2020 including over 175,000 audio tracks.We found our data on Kaggle 2. This dataset groups the data by artist, genre, and year. There are nine different variables measured in the dataset. They are acousticness, danceability, duration, energy, liveness, instrumentalness, loudness, speechiness, valence, popularity, and tempo.

Energy is a perceptual measure of the intensity and activity of a track on a scale from 0.0 to 1.0. Some of the perceptual features that are included in this are dynamic range, perceived loudness, timbre, onset rate, and general entropy. Liveness ranges from 0 to 1 and detects if an audience is present in a recording. If the liveness value is above 0.8, there is a strong likelihood that the track is live. Acousticness is the confidence measure of the track being acoustic. It varies from 0.0 to 1.0, with 1.0 representing high confidence that the track is acoustic. Loudness ranges from -60 to 0 and is measured in decibels (dB). It suggests the overall loudless averaged over the entire track. The measure of danceability includes a combination of tempo, rhythm stability, beat strength and regularity. It rates how suitable a track is for dancing from 0.0 to 1.0 with 1 being the most danceable. Duration measures the length of the track in milliseconds (ms). The instrumentalness feature tracks whether a song contains vocals. “Ooh” and “aah” sounds are treated as instrumental in this context. Rap or spoken word tracks are considered vocal. Instrumentalness ranges from 0 to 1.0 with 1.0 being the most instrumental. Speechiness is the opposite of instrumentalness, measuring the relative length of the track containing any kind of human voice. The tempo feature gives information on the tempo of the track in Beat Per Minute (BPM). Valence measures the positiveness of the track, higher valence relates to more cheerful and upbeat songs. Lastly, popularity is calculated by an algorithm that is based on the total number of plays the track has had and how recent those plays are.

In the rest of our report, we intend to first group the genres into broader categories and then analyze the features throughout the genres. We will also compare features to each other and test the correlations between two features to see if they have a strong linear relationship or not. Lastly, we want to discover which genres are the most popular by using t-tests comparing the genre popularity means. In the end, we hope to discover how popularity is related to different genres, as well and how different features relate to each other.

3 Analysis

3.1 Genre Condensation

There were 3232 genres. We condensed these into the top 20 occurring terms in these genres using regular expressions and counting the occurrences.

3.1.1 All genres

Here are all the original genres. As you can see there are thousands of them, and most of them are obscure. Some seem to not even make sense (like the genre “[]”).

3.1.2 Top 100

This is the top 100 terms found from all the genres. These terms will be used to create the simplified genres. Note that some of the original genres are double counted, such as original genre african rock being in the simplified genres african and rock.

3.1.3 Top 20

These are the top 20 terms found. They are the ones we will use. Note this only uses 60.7% of the data since 39.3% of the data do not fall under these top 20 categories.

3.1.4 Final Condensed Gernre Dataset

We use these top 20 to create a more concisely labeled dataset. We also include the label other to account for the other 1743 genres with their occurrences less than or equal to 39.

3.1.5 Condensed genre conunts

This graph shows the number of occurrances of each simplified genre.

3.1.6 Condensed genre counts (removed “other”)

This graph shows the number of occurrances of each simplified genre without the “other” category (so the scale is slighly easier to read).

3.2 Feature Correlations

3.2.1 r-values

The second question we want to answer is to see if any features have strong linear correlations to other features. To do this, we use r-values and their corresponding graphs.

3.2.1.1 All Combinations of Features

First, we found r-value between combinations of all the features, which is shown in the table below.

3.2.1.1.1 Raw r-values

The raw r-values between every combination of features.

3.2.1.1.2 Correlation plot

The r-values in an easier to view format. The red and blue show signify higher correlations.

3.2.1.2 Filter r values

As the table above shows, some features seem to have strong linear relationships, while some features seem to not have a strong linear relationship. To isolate those, we filtered for the absolute value of r-values only over .9 to find the strongest feature relations. We chose .9 as a threshold arbitrarily since there were many features that correlated. You can see some other thresholds below.

3.2.1.2.1 threshold = 0.9

There are 3 r-values above this threshold.

3.2.1.2.2 0.8

There are 10 r-values above this threshold.

3.2.1.2.3 0.7

There are 11 r-values above this threshold.

3.2.1.2.4 0.6

There are 18 r-values above this threshold.

3.2.1.2.5 none

There are 55 r-values above this threshold.

3.2.2 Top correlated features

Energy is included in each of the three strongest correlations. Energy is strongly related to acousticness, loudness and tempo. The three plots show each of the three features graphed vs energy. Each graph includes a line with a linear method to show the approximate linear regression line. While an increase in acousticness relates to a decrease in energy, an increase in loudness and tempo relates to an increase in energy.

3.2.2.1 Acousticness vs Energy

A negative correlation.

3.2.2.2 Loudness vs Energy

A positive correlation.

3.2.2.3 Tempo vs Energy

A positive correlation.

3.2.3 Tangental Correlations

In the below graph and table we can again see that there are strong correlations between energy and the other features acousticness, loudness, and tempo. However, it is also interesting to note that those same three features that we found correlate strongly with energy also correlate with each other, although to a lesser degree.

It is hard to say what this means exactly, but it does suggest a few possibilities, and speak to the difference between correlation and causation. For example, there is a r-value of -0.8715355 between tempo and loudness. However, since we know that both those features correlate even stronger with energy, it may be possible that what is more significant is their relation to energy. This shows that these features are all highly related, and the fact that they all also correlate highly with each other suggests these features all measure for something similar.

3.2.3.1 Graph

Plotted feature vs feature.

3.2.3.2 Table

The raw r-values.

3.3 Genre Analysis

3.3.1 Density Plots

We created density plots to get an initial idea of the genres relations to each feature. These are messy so in the next section we will try to make sense of them. In particular, we will analyze the popularity feature as it relates to each genre.

3.3.1.1 popularity

3.3.1.2 acousticness

3.3.1.3 danceability

3.3.1.4 duration_ms

3.3.1.5 energy

3.3.1.6 instrumentalness

3.3.1.7 liveness

3.3.1.8 loudness

3.3.1.9 speechiness

3.3.1.10 tempo

3.3.1.11 valence

3.3.2 t-tests

We ran t-tests to find differences between genres in the different features. The t-test statistic3 is as follows:

  • Let \(a\) and \(a\) be the two populations.
  • Let \(m_a\) and \(m_b\) be the means of each population
  • Let \(s_a\) and \(s_b\) be the standard deviations of each population
  • Let \(n_a\) and \(n_b\) be the sizes of each population

\[ t = \frac{m_a - m_b}{\sqrt{\frac{s_a^2}{n_a}+\frac{s_b^2}{n_b}}} \]

We use this test statistic to calculate the p-value by finding the corresponding quantile from the student t distribution with \(\max(n_a, n_b)-1\) degrees of freedom. While we will focus on analyzing popularity in particular in the next section, we do this test between every combination of genres for all features:

3.3.2.1 acousticness

All t-test between genres for acousticness.

3.3.2.2 danceability

All t-test between genres for danceability.

3.3.2.3 duration_ms

All t-test between genres for duration_ms.

3.3.2.4 energy

All t-test between genres for energy.

3.3.2.5 instrumentalness

All t-test between genres for instrumentalness.

3.3.2.6 liveness

All t-test between genres for liveness.

3.3.2.7 loudness

All t-test between genres for loudness.

3.3.2.8 speechiness

All t-test between genres for speechiness.

3.3.2.9 tempo

All t-test between genres for tempo.

3.3.2.10 valence

All t-test between genres for valence.

3.3.2.11 popularity

All t-test between genres for popularity.

Filtered for only significant differences (p-value < 0.5).

3.3.3 Popularity

We decided to further examine the popularity of the genres in depth to see if we could discover the most popular genre(s).

We made a graph to compare the box plots of all the genres with their popularity in order to get an overview of the distributions before jumping into our t tests. Overall, the boxplots show that rap has the highest mean popularity for all of the genres. Next, we will use t-tests to evaluate if the difference in the means of the genres is significant enough for us to conclude that rap has the highest mean popularity.

Additionally, we made a density plot of our estimator, the mean popularity, to check that it looks normally distributed. We did this so we know using a t-test is appropriate. As you can see, the plot seem normally distributed so a t-test is appropriate.

We found the most popular by finding the genre with the highest mean popularity. Here you can see the genre with the highest mean popularity was rap.

We then ran the t-test for the feature popularity between all genres.

We then isolated the genres that didn’t have p-value < 0.5 and therefore cannot be dismissed as not also as popular as rap.

To put these results back into context, we show the mean and standard deviation of popularity from these genres. As you can see, their means were very similar, so it makes sense that their p-values were not significant.

The following graph shows the mean popularity for hip, rap, and swedish.

4 Discussion

From our t-tests to test the genre popularity means, we were not able to come to a definite conclusion on which genre is the most popular. Because all of the p-values are not below .05, we do not have statistical evidence to reject the null that the rap mean is significantly higher than the rest of the genres. Hip and swedish have p-values above .05, so they could all still be the most popular. However, for the genres with p-values below .05, we do have statistical significance evidence that they are not the most popular.

We also discovered which features have the strongest linear correlations to each other, vs which features have no linear relationship. We found that energy has a correlation over the absolute value of 0.9 to three other features, acousticness, loudness and tempo. Acousticness has a negative correlation with energy while loudness and tempo both have positive correlations with energy. Considering that acousticness, loudness, and tempo are all measured based on set measurements, while energy is calculated from intensity and activity in the song, we can infer that acousticness, loudness, and tempo all affect the energy of a song.

A short-coming of our analysis is that we do not know how many songs are included in the data for each genre. Some genre’s data may be based on more songs than other genres. In addition, because we only filtered the top 20 highest strings to group genres, some of the genres are not included in our analysis.

Future work on this dataset could involve testing out more of the features relationships and seeing if they have strong models. We could also look for datasets from other music streaming services, such as Apple Music and Pandora.

5 References